Deterministic Indexing for Packed Strings

نویسندگان

  • Philip Bille
  • Inge Li Gørtz
  • Frederik Rye Skjoldjensen
چکیده

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In the deterministic variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the packed variant the strings are stored with several character in a single word, giving us the opportunity to read multiple characters simultaneously. Our main result is a new string index in the deterministic and packed setting. Given a packed string S of length n over an alphabet σ, we show how to preprocess S in O(n) (deterministic) time and space O(n) such that given a packed pattern string of length m we can support queries in (deterministic) time O (m/α+ logm+ log log σ) , where α = w/ log σ is the number of characters packed in a word of size w = Θ(logn). Our query time is always at least as good as the previous best known bounds and whenever several characters are packed in a word, i.e., log σ ≪ w, the query times are faster.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Threshold Indexing for Uncertain Strings

Strings form a fundamental data type in computer systems. String searching has been extensively studied since the inception of computer science. Increasingly many applications have to deal with imprecise strings or strings with fuzzy information in them. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable...

متن کامل

Deciding Indexing Strings with Statistical Analysis

Deciding indexing string is important for Information Retrieval. Ideally, the strings should be the words that represent the documents or query. Although each single word may be the first candidate of indexing strings for English corpus, it may not ideal due to the existence of compound nouns, which are often good indexing strings, and which depends on genre of corpus. The situation is even wor...

متن کامل

A Generalized Approach for Image Indexing and Retrieval Based on 2-D Strings

2-D strings is one of a few representation structures originally designed for use in an IDB environment. In this paper, we propose a generalized approach for 2-D string based indexing which avoids the exhaustive search through the entire database of previous 2-D strings based techniques. The classical framework of representation of 2-D strings is also specialized to the cases of scaled and unsc...

متن کامل

Raz-McKenzie simulation with the inner product gadget

In this note we show that the Raz-McKenzie simulation algorithm which lifts deterministic query lower bounds to deterministic communication lower bounds can be implemented for functions f composed with the Inner Product gadget 1ip(x, y) = ∑ i xiyi mod 2 of logarithmic size. In other words, given a function f : {0, 1}n → {0, 1} with deterministic query complexity D( f ), we show that the determi...

متن کامل

Generating Indexing Functions of Regularly Sparse Arrays for Array Compilers

There are many applications involving arrays that contain non-zero components in regular geometric partitions. These include triangular, diagonal, tridiagonal, banded, etc. When computing with this type of arrays, they are usually stored in a packed form and computations are performed with only the non-zero components. This packed form requires an indexing function that maps an index of the arr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017